safe trajectory
Conformal Safety Monitoring for Flight Testing: A Case Study in Data-Driven Safety Learning
Feldman, Aaron O., Harp, D. Isaiah, Duncan, Joseph, Schwager, Mac
We develop a data-driven approach for runtime safety monitoring in flight testing, where pilots perform maneuvers on aircraft with uncertain parameters. Because safety violations can arise unexpectedly as a result of these uncertainties, pilots need clear, preemptive criteria to abort the maneuver in advance of safety violation. To solve this problem, we use offline stochastic trajectory simulation to learn a calibrated statistical model of the short-term safety risk facing pilots. We use flight testing as a motivating example for data-driven learning/monitoring of safety due to its inherent safety risk, uncertainty, and human-interaction. However, our approach consists of three broadly-applicable components: a model to predict future state from recent observations, a nearest neighbor model to classify the safety of the predicted state, and classifier calibration via conformal prediction. We evaluate our method on a flight dynamics model with uncertain parameters, demonstrating its ability to reliably identify unsafe scenarios, match theoretical guarantees, and outperform baseline approaches in preemptive classification of risk.
Boundary-to-Region Supervision for Offline Safe Reinforcement Learning
Su, Huikang, Peng, Dengyun, Zhuang, Zifeng, Liu, YuHan, Chen, Qiguang, Wang, Donglin, Liu, Qinghe
Offline safe reinforcement learning aims to learn policies that satisfy predefined safety constraints from static datasets. Existing sequence-model-based methods condition action generation on symmetric input tokens for return-to-go and cost-to-go, neglecting their intrinsic asymmetry: return-to-go (RTG) serves as a flexible performance target, while cost-to-go (CTG) should represent a rigid safety boundary. This symmetric conditioning leads to unreliable constraint satisfaction, especially when encountering out-of-distribution cost trajectories. To address this, we propose Boundary-to-Region (B2R), a framework that enables asymmetric conditioning through cost signal realignment . B2R redefines CTG as a boundary constraint under a fixed safety budget, unifying the cost distribution of all feasible trajectories while preserving reward structures. Combined with rotary positional embeddings , it enhances exploration within the safe region. Experimental results show that B2R satisfies safety constraints in 35 out of 38 safety-critical tasks while achieving superior reward performance over baseline methods. This work highlights the limitations of symmetric token conditioning and establishes a new theoretical and practical approach for applying sequence models to safe RL. Our code is available at https://github.com/HuikangSu/B2R.
Towards Safe Robot Foundation Models Using Inductive Biases
Tรถlle, Maximilian, Gruner, Theo, Palenicek, Daniel, Schneider, Tim, Gรผnster, Jonas, Watson, Joe, Tateo, Davide, Liu, Puze, Peters, Jan
Safety is a critical requirement for the real-world deployment of robotic systems. Unfortunately, while current robot foundation models show promising generalization capabilities across a wide variety of tasks, they fail to address safety, an important aspect for ensuring long-term operation. Current robot foundation models assume that safe behavior should emerge by learning from a sufficiently large dataset of demonstrations. However, this approach has two clear major drawbacks. Firstly, there are no formal safety guarantees for a behavior cloning policy trained using supervised learning. Secondly, without explicit knowledge of any safety constraints, the policy may require an unreasonable number of additional demonstrations to even approximate the desired constrained behavior. To solve these key issues, we show how we can instead combine robot foundation models with geometric inductive biases using ATACOM, a safety layer placed after the foundation policy that ensures safe state transitions by enforcing action constraints. With this approach, we can ensure formal safety guarantees for generalist policies without providing extensive demonstrations of safe behavior, and without requiring any specific fine-tuning for safety. Our experiments show that our approach can be beneficial both for classical manipulation tasks, where we avoid unwanted collisions with irrelevant objects, and for dynamic tasks, such as the robot air hockey environment, where we can generate fast trajectories respecting complex tasks and joint space constraints.
Learning Robot Safety from Sparse Human Feedback using Conformal Prediction
Feldman, Aaron O., Vincent, Joseph A., Adang, Maximilian, Low, Jun En, Schwager, Mac
Ensuring robot safety can be challenging; user-defined constraints can miss edge cases, policies can become unsafe even when trained from safe data, and safety can be subjective. Thus, we learn about robot safety by showing policy trajectories to a human who flags unsafe behavior. From this binary feedback, we use the statistical method of conformal prediction to identify a region of states, potentially in learned latent space, guaranteed to contain a user-specified fraction of future policy errors. Our method is sample-efficient, as it builds on nearest neighbor classification and avoids withholding data as is common with conformal prediction. By alerting if the robot reaches the suspected unsafe region, we obtain a warning system that mimics the human's safety preferences with guaranteed miss rate. From video labeling, our system can detect when a quadcopter visuomotor policy will fail to steer through a designated gate. We present an approach for policy improvement by avoiding the suspected unsafe region. With it we improve a model predictive controller's safety, as shown in experimental testing with 30 quadcopter flights across 6 navigation tasks. Code and videos are provided.
Offline Safe Reinforcement Learning Using Trajectory Classification
Gong, Ze, Kumar, Akshat, Varakantham, Pradeep
Offline safe reinforcement learning (RL) has emerged as a promising approach for learning safe behaviors without engaging in risky online interactions with the environment. Most existing methods in offline safe RL rely on cost constraints at each time step (derived from global cost constraints) and this can result in either overly conservative policies or violation of safety constraints. In this paper, we propose to learn a policy that generates desirable trajectories and avoids undesirable trajectories. To be specific, we first partition the pre-collected dataset of state-action trajectories into desirable and undesirable subsets. Intuitively, the desirable set contains high reward and safe trajectories, and undesirable set contains unsafe trajectories and low-reward safe trajectories. Second, we learn a policy that generates desirable trajectories and avoids undesirable trajectories, where (un)desirability scores are provided by a classifier learnt from the dataset of desirable and undesirable trajectories. This approach bypasses the computational complexity and stability issues of a min-max objective that is employed in existing methods. Theoretically, we also show our approach's strong connections to existing learning paradigms involving human feedback. Finally, we extensively evaluate our method using the DSRL benchmark for offline safe RL. Empirically, our method outperforms competitive baselines, achieving higher rewards and better constraint satisfaction across a wide variety of benchmark tasks.
CaRT: Certified Safety and Robust Tracking in Learning-based Motion Planning for Multi-Agent Systems
Tsukamoto, Hiroyasu, Riviรจre, Benjamin, Choi, Changrak, Rahmani, Amir, Chung, Soon-Jo
The key innovation of our analytical method, CaRT, lies in establishing a new hierarchical, distributed architecture to guarantee the safety and robustness of a given learning-based motion planning policy. First, in a nominal setting, the analytical form of our CaRT safety filter formally ensures safe maneuvers of nonlinear multi-agent systems, optimally with minimal deviation from the learning-based policy. Second, in off-nominal settings, the analytical form of our CaRT robust filter optimally tracks the certified safe trajectory, generated by the previous layer in the hierarchy, the CaRT safety filter. We show using contraction theory that CaRT guarantees safety and the exponential boundedness of the trajectory tracking error, even under the presence of deterministic and stochastic disturbance. Also, the hierarchical nature of CaRT enables enhancing its robustness for safety just by its superior tracking to the certified safe trajectory, thereby making it suitable for off-nominal scenarios with large disturbances. This is a major distinction from conventional safety function-driven approaches, where the robustness originates from the stability of a safe set, which could pull the system over-conservatively to the interior of the safe set. Our log-barrier formulation in CaRT allows for its distributed implementation in multi-agent settings. We demonstrate the effectiveness of CaRT in several examples of nonlinear motion planning and control problems, including optimal, multi-spacecraft reconfiguration.
Safer Gap: A Gap-based Local Planner for Safe Navigation with Nonholonomic Mobile Robots
Feng, Shiyu, Abuaish, Ahmad, Vela, Patricio A.
This paper extends the gap-based navigation technique in Potential Gap by guaranteeing safety for nonholonomic robots for all tiers of the local planner hierarchy, so called Safer Gap. The first tier generates a Bezier-based collision-free path through gaps. A subset of navigable free-space from the robot through a gap, called the keyhole, is defined to be the union of the largest collision-free disc centered on the robot and a trapezoidal region directed through the gap. It is encoded by a shallow neural network zeroing barrier function (ZBF). Nonlinear model predictive control (NMPC), with Keyhole ZBF constraints and output tracking of the Bezier path, synthesizes a safe kinematically-feasible trajectory. Low-level use of the Keyhole ZBF within a point-wise optimization-based safe control synthesis module serves as a final safety layer. Simulation and experimental validation of Safer Gap confirm its collision-free navigation properties.
An NCAP-like Safety Indicator for Self-Driving Cars
Huang, Jimy Cai, Kurniawati, Hanna
This paper proposes a mechanism to assess the safety of autonomous cars. It assesses the car's safety in scenarios where the car must avoid collision with an adversary. Core to this mechanism is a safety measure, called Safe-Kamikaze Distance (SKD), which computes the average similarity between sets of safe adversary's trajectories and kamikaze trajectories close to the safe trajectories. The kamikaze trajectories are generated based on planning under uncertainty techniques, namely the Partially Observable Markov Decision Processes, to account for the partially observed car policy from the point of view of the adversary. We found that SKD is inversely proportional to the upper bound on the probability that a small deformation changes a collision-free trajectory of the adversary into a colliding one. We perform systematic tests on a scenario where the adversary is a pedestrian crossing a single-lane road in front of the car being assessed --which is, one of the scenarios in the Euro-NCAP's Vulnerable Road User (VRU) tests on Autonomous Emergency Braking. Simulation results on assessing cars with basic controllers and a test on a Machine-Learning controller using a high-fidelity simulator indicates promising results for SKD to measure the safety of autonomous cars. Moreover, the time taken for each simulation test is under 11 seconds, enabling a sufficient statistics to compute SKD from simulation to be generated on a quad-core desktop in less than 25 minutes.
Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems
Learning-based control algorithms require collection of abundant supervision for training. Safe exploration algorithms enable this data collection to proceed safely even when only partial knowledge is available. In this paper, we present a new episodic framework to design a sub-optimal pool of motion plans that aid exploration for learning unknown residual dynamics under safety constraints. We derive an iterative convex optimization algorithm that solves an information-cost Stochastic Nonlinear Optimal Control problem (Info-SNOC), subject to chance constraints and approximated dynamics to compute a safe trajectory. The optimization objective encodes both performance and exploration, and the safety is incorporated as distributionally robust chance constraints.
System prevents speedy drones from crashing in unfamiliar areas
Autonomous drones are cautious when navigating the unknown. Now MIT researchers have developed a trajectory-planning model that helps drones fly at high speeds through previously unexplored areas, while staying safe. The model -- aptly named "FASTER" -- estimates the quickest possible path from a starting point to a destination point across all areas the drone can and can't see, with no regard for safety. But, as the drone flies, the model continuously logs collision-free "back-up" paths that slightly deviate from that fast flight path. When the drone is unsure about a particular area, it detours down the back-up path and replans its path. The drone can thus cruise at high speeds along the quickest trajectory while occasionally slowing down slightly to ensure safety.